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Abstract 

We study the complexity of the maximum coverage problem, restricted to set systems of 
bounded VC-dimension. Our main result is a fixed-parameter tractable approximation scheme: 
an algorithm that outputs a (1 — e)-approximation to the maximum-cardinality union of k sets, 
in running time 0(f(e,k,d) ■ poly(n)) where n is the problem size, d is the VC-dimension of 
the set system, and /(e, k,d) is exponential in (kd/e) c for some constant c. We complement 
this positive result by showing that the function /(e, k,d) in the running-time bound cannot 
be replaced by a function depending only on (e,d) or on (k,d), under standard complexity 
assumptions. 

We also present an improved upper bound on the approximation ratio of the greedy algorithm 
in special cases of the problem, including when the sets have bounded cardinality and when they 
are two-dimensional halfspaces. Complementing these positive results, we show that when the 
sets are four-dimensional halfspaces neither the greedy algorithm nor local search is capable of 
improving the worst-case approximation ratio of 1 — 1/e that the greedy algorithm achieves on 
arbitrary instances of maximum coverage. 
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1 Introduction 



The Maximum Coverage problem is one of the classical NP-hard combinatorial optimization 
problems. An instance of Maximum Coverage is specified by a triple (U, TZ, k) where U is a 
finite set, TZ is a collection of subsets of U, and k is a positive integer. The objective is to output 
a /c-tuple of elements of TZ such that their union contains as many elements as possible. (In the 
weighted version of the problem, elements x € U have non-negative weights w(x) and the objective 
is to maximize the combined weight of elements in the union.) A very natural greedy algorithm for 
Maximum Coverage chooses k sets sequentially, where each new set is chosen to maximize the 
number (or combined weight) of elements covered by the new set but not by any of the preceding 
ones. It has been known for decades that this algorithm has an approximation factor of (l — -) [5]; 
in fact, it is a special case of the greedy algorithm for maximizing a monotone submodular function 
subject to a cardinality constraint [14], and the algorithm's approximation factor remains (l — -) 
even in this more general case. It was shown by Feige [7] that this approximation factor is the best 
possible, even for the unweighted Maximum Coverage problem, unless P=NP. 

One of the reasons the greedy algorithm for Maximum Coverage is so widely studied is that 
it has innumerable applications: originally introduced by Cornuejols, Fisher, and Nemhauser [4] 
to model the problem of locating bank accounts in multiple cities to maximize "float," it was 
subsequently applied in databases [9], social networks [11], sensor placement [12], information 
retrieval [17, 19], and numerous other areas. A prototypical application arises in the information 
retrieval setting, when considering the problem of assembling a list of k documents to satisfy the 
information needs of as many users as possible. Equating every document with the set of users 
whom it satisfies, we see that this information retrieval problem is modeled by the Maximum 
Coverage problem. 

Given the extremely broad applicability of Maximum Coverage problems, it is natural to 
wonder whether the approximation ratio of 1 — ^ is the strongest theoretical guarantee one can 
hope for. Feige's hardness result eliminates the possibility of obtaining a better worst-case approx- 
imation ratio in polynomial-time, but the problem instances arising in applications are unlikely to 
resemble worst-case instances of Maximum Coverage. Is it possible to identify broad classes of 
Maximum Coverage instances (hopefully resembling those that arise in practice) such that the 
greedy algorithm provably achieves an approximation factor better than 1 — - on these instances? If 
not, can one design a different polynomial-time algorithm with an improved approximation factor? 
These are the questions that inspired our paper. 

Let us reconsider the problem of assembling a top-A; list of documents, mentioned above, in 
light of these questions. At least two aspects of this application distinguish it from an arbitrary 
instance of Maximum Coverage. 

(1) The value k is quite small compared to n, the input size. A typical instance might involve 
processing a list of thousands or millions of documents to extract a list of k = 10 top choices. 

(2) The set system (U, TZ) is likely to have a "low-dimensional" structure. For example, a natural 
model of users' preferences might assume that there are d n topics, a document contains a 
mix of topics described by a vector in Wt, and the user's information need is satisfied if the dot 
product of this vector with another vector in (describing the mix of topics the user seeks 
to read about) exceeds some threshold. 

Is the approximation ratio of the greedy algorithm better than 1 — - under these circumstances? 
If not, is there some other algorithm that is significantly better? 

We answer the first question negatively and the second one affirmatively. More precisely, for 
d > 4 we show that the greedy algorithm's approximation ratio in this special case is no better 
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than its worst-case approximation ratio of 1 — (l — i) « 1 — ~, but that there is an algorithm 
with running time 0(/(e, k, d) ■ poly(n)) whose approximation factor is (1 — e), for some function 
f(e,k,d). (Of course, for very small values of k a trivial brute-force search over all collections of k 
sets in 1Z finds an exactly optimal solution in 0(n k+1 ) time, but a fixed-parameter algorithm whose 
running time is exponential in k but quadratic in n is vastly faster when k = 10 and n = 10 6 , for 
instance.) 

The following subsection describes our contributions in more detail. 

1.1 Our contributions 

Our main contribution is a fixed-parameter approximation scheme (fpt-AS) for the Maximum 
Coverage problem, parameterized by the number of sets k, the approximation parameter e, 
and the VC-dimension of the set system, d. Letting n denote the problem size — i.e. the sum 
of cardinalities of all the sets in 1Z — the approximation scheme has running time 0(/(e, k, d) ■ 

poly(n)), where f(e,k,d) = exp ^0(/c 2 <ie -5 )^ 1 . The algorithm, which is presented in Section 3, 
is based on three ingredients. First, set systems of bounded VC-dimension have bounded-size e- 
approximations (see Section 3 for definitions) and there is even a deterministic algorithm to find 
them in linear time [3]. Second, this means it is easy to design a fpt-AS for the special case 
of Maximum Coverage in which the set system has bounded VC-dimension and the optimum 
solution covers a constant fraction of the elements. Third, the general case can be reduced to this 
special case by an intricate non-deterministic algorithm, which can then be made deterministic at 
the cost of blowing up the running time by a factor that is exponential in k 2 de~ 5 , but independent 
of n. 

In Section 5 we show that various aspects of this result cannot be improved, under standard 
complexity assumptions. First, the function /(e, k, d) cannot be replaced by a function depend- 
ing polynomially on k unless P = NP. Second, it cannot be replaced by a function depending 
polynomially on 1/e unless P = W[l]. (The question of whether the exponential dependence on 
d can be eliminated is intriguing, but it is unlikely to be easily resolvable since fixed-parameter 
complexity theory lacks machinery analogous to the PCP Theorem for proving PF[l]-hardness of 
approximation.) Furthermore, these hardness results apply even in some very simple cases: Max- 
imum Coverage with set systems of VC-dimension 2, or with halfspaces in dimension 4, or with 
rectangles in dimension 2. Moreover, in all three of these special cases, the greedy algorithm fails 
to achieve an approximation factor better than 1 . 

These negative results about the greedy algorithm are counterbalanced by some positive results 
that we present in Section 4. We identify a parameter of the problem instance — the covering 
multiplicity, denoted by r — such that the greedy algorithm's approximation factor is never worse 
than 1 - (1 - i) r . The covering multiplicity satisfies r < k, and when the inequality is strict 
this improves upon the worst-case approximation bound for the greedy algorithm. For problem 
instances whose sets have cardinality at most r, the covering multiplicity is bounded by r, and for 
instances in which the sets are two-dimensional halfspaces the covering multiplicity is bounded by 
2, implying that the greedy algorithm is a |-approximation in the latter case. 

1.2 Related work 

As mentioned above, the Maximum Coverage problem was introduced, and the greedy algorithm 
analyzed, by Conuejols et al. in [4]. This work was subsequently generalized to the context of sub- 

1 hiding log factors 
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modular functions by Nemhauser et al. [14]. A matching hardness of approximation for MAXIMUM 
Coverage was obtained by Feige [7], in a paper that also settled the approximation hardness 
of the closely related Set Cover problem, establishing that the greedy algorithm achieves the 
optimal approximation ratio (up to lower order terms) for both problems. 

The approximability of special cases of Set Cover and Maximum Coverage was subse- 
quently investigated in numerous papers. For example, the Maximum Vertex Coverage problem 
is the special case of Maximum Coverage in which U is the edge set of a graph and every set in TZ 
is the set of edges incident to one vertex of that graph. This special case of the problem was shown 
to be APX-hard by Petrank [16]. A landmark paper by Ageev and Sviridenko [1] introduced the 
technique of pipage rounding and used it to give a (non-greedy) polynomial-time |-approximation 
algorithm for Maximum Vertex Coverage; more generally, they gave a polynomial-time algo- 
rithm with approximation factor ^1 — (l — -jj;)^ for the special case in which every element of U 
belongs to at most k sets in TZ. 

Computational geometers have intensively studied special cases of Set Cover, or the dual 
problem of Hitting Set, when the set system is defined geometrically, e.g. by rectangles, disks, or 
halfspaces. A seminal paper by Bronniman and Goodrich [2] introduced a multiplicative-weights 
method for approximating Hitting Set, and applied this method to design constant-factor ap- 
proximation algorithms for various classes of bounded- VC-dimensional set systems, e.g. disks in 
the plane. The weighted case of these problems turns out to be much more challenging; see [8, 18]. 
A breakthrough paper by Mustafa and Ray [10] presented a new method to analyze local search 
algorithms for geometric hitting set problems, thereby proving that local search yields a PTAS for 
many interesting special cases such as three-dimensional halfspaces. 

The study of fixed-parameter approximation schemes — and fixed-parameter approximation 
algorithms more generally — is still in its youth. An excellent survey by Marx [13] includes an 
fpt-AS for Maximum Vertex Coverage (also known as Partial Vertex Cover), a problem 
which is a special case of bounded- VC-dimensional Maximum Coverage. Thus, one consequence 
of our algorithm in Section 3 is an alternative fpt-AS for Partial Vertex Cover, although the 
techniques underlying our algorithm are very different from those in Marx's algorithm. 

2 Preliminaries 

An instance of the Maximum Coverage problem is specified by a finite set U, a collection of 
subsets TZ, and a positive integer k. We will assume that the input is specified by simply listing 
the elements of U and those of each set in TZ; thus, the problem size is n = ^_Re7?. I-^I - ^ n ^ ne 
Weighted Maximum Coverage problem, we are also given a function w : U — > M+; the weight 
of a set S C U is defined to be w(S) = ^2 x& s w ( x ) anc ^ the goal is to output a fc-tuple of elements 
of TZ whose union has maximum weight. We will denote this maximum by OPT(U). 

For A C U, we will use the notation TZ\a to denote the collection of all subsets B C A of the 
form B = A n R, where R £ TZ. The set A is shattered by TZ if TZ\a is equal to 2 , the collection of 
all subsets of A. The VC-dimension of (U, TZ) is the cardinality of the largest set that is shattered 
by TZ. If (U,TZ) has VC-dimension d and A C U, it is known that \TZ\a\ is bounded above by 
0(\A\ d ). 

Our focus will be on Maximum Coverage problems such that (U,TZ) has bounded VC- 
dimension. Among these, two special cases of particular interest are Maximum Halfspace 
Coverage — in which U is a subset of M. d and each of the sets in TZ is the intersection of a 
halfspace with U — and Maximum Rectangle Coverage, in which U is again a subset of R d 
and each of the sets in TZ is obtained by intersecting an axis-parallel rectangle with U. 
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3 A fixed- parameter approximation scheme 



In this section, we work with the unweighted Maximum Coverage problem. Following Chazelle 
and Matousek, we assume that TZ is represented by a subsystem oracle of dimension d, defined as 
follows. 

Definition 3.1 ([3]). A subsystem oracle of dimension d for a set system (U,1Z) is an algorithm 
which, given a subset A C U, returns a list of all sets in 1Z\a in time O (\A\ d+1 y, the number of 
sets in this list must also be bounded above by O (1^4^). 

The following fact is obvious but useful: a subsystem oracle of dimension d for (U, 1Z) also 
constitutes a subsystem oracle of dimension d for {V,TZ\y), for every subset V C U. 
We define a set A C U to be an e-approximation of (U, 1Z) if the inequality 



\A DR\ \R\ 



\A\ \U\ 



< e 



holds for all R G 1Z. A crucial ingredient of our approximation scheme is an algorithm, due to 
Chazelle and Matousek [3], that computes an e-approximation of cardinality O (de~ 2 log((i/e)) for 
(U,K) in time O (d 3d e- 2d log d (d/e)n) , given a subsystem oracle of dimension d for (U,1Z). 

Let lZ uk denote the collection of all sets R\ U ■ ■ ■ U Rf. such that R\, . . . , R^ G 1Z. To apply 
Chazelle and Matousek's algorithm, we will need a subsystem oracle for TZ uk . The existence of such 
an oracle is ensured by the following lemma. 

Lemma 3.2. // (U,7Z) has a subsystem oracle of dimension d, then for all k > 0, (U, lZ uk ) has a 
subsystem oracle of dimension kd. 

Proof. The proof is by induction on k, the base case k = 1 being trivial. Given subsystem oracles 
for (U,1Z) and (U, 7£ ufc_1 ) of dimensions d and (k — l)d, respectively, the following simple algorithm 
constitutes a subsystem oracle for (U,lZ uk ). First, we use the given two subsystem oracles to 
generate lists Qi and Qk-i, consisting of all sets in 1Z\a and lZ uk ~ l \A, respectively. Letting a = \A\, 
the induction hypothesis implies that |Qi| = 0(a d ) and |Qfc-i| = 0(a^ k ~ 1 ^ d ), and that the two lists 
are generated in time 0(a d+l ) and 0(a^ k ~ 1 ^ d+1 ), respectively. Now, for every pair B\ G Qi and 
Bk~i £ Qk-i, w e form the set B = B\ U B^-i and add it to Qk- There are 0(a kd ) such pairs, and 
for each pair the union can be computed in 0(a) time, so the algorithm runs in time 0(a kd+1 ), as 
desired. □ 

As an easy consequence, we derive that the Maximum Coverage problem has a fpt-AS when 
(U,1Z) has a bounded-dimensional subsystem oracle and the optimum is a constant fraction of |J7|. 

Lemma 3.3. For any constants c, 5 > 0, consider the Maximum Coverage problem, restricted 
to set systems (U,1Z) having a subsystem oracle of dimension d and satisfying OPT(U) > c\U\. 
This special case of the Maximum Coverage problem has a (— '■^■^-approximation algorithm with 
running time bounded by 0(d 3kd k 3kd 5- 2kd - 2 log kd+1 (kd/5)n). 

Proof. The set system (U, 7£ ufc ) has a subsystem oracle of dimension kd, so it is possible to com- 
pute a set A C U which is a ^-approximation to (U,TZ uk ), in time 0(d 3kd k 3kd 5~ 2kd log kd (kd/S)n). 
Furthermore, the cardinality of A is 0(kd5~ 2 \og(kd/ 5)). We can solve the Maximum Cov- 
erage problem for the set system (A,7Z\a) by brute force. First we call the subsystem ora- 
cle to obtain a list of all the sets in 1Z\a', there are at most 0(k d d d 5~ 2d log d (kd/5)) such sets. 
Then we enumerate all A;-tuples of sets in this list, compute their union, and output the /c-tuple 
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whose union has the largest cardinality. Computing the union of k sets requires 0(/c|A|) = 
0(k 2 d5~ 2 \og(kd/5)) time, and multiplying this by the number of fc-tuples we obtain an overall 
running time of 0(k kd+2 d kd+1 5- 2kd - 2 \og kd+1 {kd / 5)) . 

Let . . . , Rk be sets in 1Z whose restrictions to A constitute an optimal solution of the Max- 
imum Coverage problem for (A,1Z\a)- Let S±,...,Sk be an optimal solution of the Maximum 
Coverage problem for (U,TZ). We have 

|(2*i u ■ ■ ■ u R k ) n A\ | (Si u • • • u s k ) n A\ 



\A\ ~ \A\ 

\RiU---URk\ |5iU---u5 fc | _ 

\U\ ~ \U\ 
\Ri U • • • UR k \ >1 _ 2§ ( V\ \ >x _^± = c-^ 



\S\ U • • • USfc| ~ \\S\ u • • • u s k \J - c 

where the first line follows from the construction of R±, . . . ,Rk, the second line follows from the 
fact that A is an ^-approximation for for (U,lZ uk ), and the third line follows from our assumption 
that |5iU---U5 fc | = OPT(U) > c\U\. □ 

For the remainder of this section, we work on eliminating the assumption that OPT(U) > c\U\. 
Our plan of attack is to perform a preprocessing step that extracts a subset V C U such that 
OPT(V) > (1 - e/3)OPT(U) and OPT{V) > c\V\, for a constant c = c(e, k) depending only on e 
and k. Then we will run the algorithm from Lemma 3.3 on (V,1Z\y), using an appropriate choice 
of 5 = <5(e, k), to obtain a (1 — e)-approximation to OPT(U). 

To design and analyze the preprocessing algorithm that constructs V, we must first define a new 
problem that we call Constrained Maximum Coverage and analyze a simple greedy algorithm 
for the problem. 

Definition 3.4. An instance of the Constrained Maximum Coverage problem is specified by 
a universe U and k collections of sets Hi, . . . ,lZk C 2^. A solution of the problem is specified by 
designating a /c-tuple of sets 22i, . . . , Rk such that R4 £ IZi for i = 1, . . . , k. The objective is to 
maximize \Ri U • • • U Rk\- 

The greedy algorithm for Constrained Maximum Coverage selects Ri, 2? 2 , • • • , Rk, in that 
order, by choosing Ri to be the maximum-cardinality set in IZi and, for i > 2, choosing Ri to be 
the set in 7Zi that maximizes |2?j \ (2?i U • • • U 2?,;_i)|. 

Note that Maximum Coverage is the special case of constrained maximum A;-coverage 
in which TZi = ■ ■ ■ = TZk, and that the greedy algorithm specializes, in that case, to the familiar 
greedy algorithm for maximum fe-coverage. The approximation ratio of the greedy algorithm for 
constrained maximum fc-COVERAGE is not equal to 1 — ~ in general; in fact it is equal to \. 
However, for our purposes the following property of the greedy algorithm will be more crucial to 
the analysis. 

Lemma 3.5. Given an instance of the Constrained Maximum Coverage -problem, let Ri, ■ ■ ■ , Rk 
be the sets selected by the greedy algorithm and let Si,...,Sk be any other solution. Let R = 
Ri U • • • U Rk and S = Si U • • • U Sk- For every 5 > 0, at least one of the following two alternatives 
holds. 

1. |R| > (l-tf)|S|. 

2. |S\R| < (1 -S)\S\. 
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Proof. We will construct a one-to-one mapping from S\R into R. This suffices to prove the lemma, 
since either |S\Rj < (1 — S)\S\ or |S\Rj > (1 — <5)|Sj, and in the latter case our one-to-one mapping 
will certify that 

|R| > |S\R| > (1-<5)[S|. 

To construct the one-to-one mapping, partition S \ R into k sets Ti, T2, ■ ■ ■ , Tfe, where Tj = 
Si \ (R U Si U S 2 U • • • U Si^i). Note that T is a subset of Si \ (Ri U • • • U Ri-i), hence 

\Ti\ < |5i\(^lU---Ui?i_i)| < I^X^lU-'-Ui^-i)!, 

where the second inequality follows from the definition of the greedy algorithm. This means that 
there is a one-to-one mapping from T to Ri \ (R\ U • • • U Ri~i). Combining these one-to-one 
mappings gives us the desired one-to-one mapping from S \ R = Ui=i T% into R = Ilf=i R-i \ (Ri U 
•••Ui?j-i). □ 

We now describe and analyze a non-deterministic algorithm to solve Maximum Coverage on 
a set system (U,1Z), given a subsystem oracle of dimension d; later we will make the algorithm 
deterministic. The algorithm proceeds in a sequence of phases numbered 1, . . . ,p = \- In ■ In 
each phase q, it chooses a fc-tuple of sets R\, . . . , Rl. Let 

R* = UU4 

i=i j=i 

In phase q, the algorithm computes a set A q , of cardinality 0(kde~ 2 p 2 \og(kdp/e)), which is an 
(e/6p)-approximation to (R 9_1 , lZ uk ). It non-deterministically guesses a sequence of k subsets 
Bf, . . . , Bf C A q and defines set systems TZf, . . . , TZt as 

K\ = {R G K I Rn A q = Bf}, i = l,...,k. 

It then selects the sets R\, . . . ,R\ using the greedy algorithm for Constrained Maximum Cov- 
erage, applied to the universe U \ R 9_1 with set systems 1Z\, . . . , lZ 9 k . After repeating this process 
for p = |~| In (I)] phases, it defines V = R p = UiLi U^=i R)- Setting c = l/p and 




so that (c — 25)/c > 1 — e/3, it runs the algorithm of Lemma 3.3 on the set system (V, 1Z) to find 
a (1 — e/3)-approximation to the optimum of the Maximum Coverage problem for (V, 7Z). 

We aim to prove that there exists an execution of this non-deterministic algorithm that yields 
a (1 — e)-approximation to the optimum of the Maximum Coverage problem for (U,7Z). If our 
algorithm produces a set V satisfying OPT{V) > c\V\ = \V\/p and OPT(V) > (1 - e/3)OPT(U), 
then Lemma 3.3 ensures that we finish up by producing a (1 — e/3)-approximation to OPT(V), 
which will also be a (1 — e/3) 2 > (1 — e)-approximation to OPT(U). Proving that OPT(V) > \V\/p 
is easy: V = R p is the union of p sets R 9 \ R 9_1 , each of which has cardinality at most OPT(V) 
since it can be covered by the k sets R\, . . . ,R 9 k . 

To prove that there exists an execution yielding a set V such that OPTiV) > (l — e/3)OPT(U), 
we use Lemma 3.5. Let Si, . . . , Sk denote an optimal solution of the Maximum Coverage problem 
for (U,1Z). Consider the execution in which the algorithm's choice of B 9 is equal to Si n A q for 
every q, i. There are two cases to consider. First, suppose that exists a phase q such that 

|(H?U---U R q k ) \ R 9_1 | > (l - |) \(Si U ■ ■ ■ U S k ) \ R 9-1 |. (1) 
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Recall that TZ q = {R € TZ \ R D A q = Bf}, and that we are assuming Bf = A q n S'j. Hence, we have 
R q n A g = Si D A q for all * and, consequently, {R\ U • • • U R q k ) n A 9 = (Si U • • • U S k ) n ,4 9 . Using 
the fact that A 9 is an (e/6p)-approximation for (R 9_1 , TZ uk ), we now obtain 

|(i2?u---uM)nR 9 - 1 | > \(S 1 u---uS k )nK q - 1 \ - — IR 9 " 1 !. (2) 

op 

Letting S denote Si U • • • U S k , we sum (1) and (2) to obtain 

\R q U---UR q \ > (l- -) \s\K q - 1 \ + (SnR 9 " 1 ! - — IR^ 1 ! 
K V 6/ 6p 

= \S\- e -\S\K q ~ 1 \--^\K q - 1 \. (3) 
b bp 

Now, as above, R 9-1 can be partitioned into sets R*\R , (i = 1, . . . , q— 1), each having cardinality 
at most OPT(U) = |S|. The number of pieces of the partition is q — 1 < p, so ^ | Ft* 3 " 1 1 < |5|. 
Substituting this back into (3), we obtain 

OPT(V) > \R\ U • • • U R\\ > (l - € - - 1) \S\ = (l - |) OPT(C/), (4) 

as desired. 

Finally, there remains the case that (1) is not satisfied by any q. Then Lemma 3.5 implies that 
| (S x U • • • U S k ) \ R q \ < (l - |) | (Si U • • • U S fc ) \ R 9 " 1 ] (5) 
for all q. Combining (5) for q = 1, . . . ,p, we get that 

| (Si U-U5 k )\ R p | < (l - £f |5i U • • • U S k \ < | |5i U • • • U Sjtl, 
which implies that 

|(5i u • • • u s k ) n R p | > (l - |) |5i u • • • u S k \, 

and hence OPT(V) > (1 - e/3)OPT(U) since F = R p . 

To turn the non-deterministic algorithm into a deterministic one, we simply run every possible 
execution of the non-deterministic algorithm and output the best answer. An execution of the 
non-deterministic algorithm is determined by the choice of sets B q , (1 < q < p, 1 < i < k). Recall 
that B q must be a subset of A q and that \A q \ = 0(kde~ 2 p 2 \og{kdp / e)) . Hence if N(k, d, e) denotes 
the number of executions of the non-deterministic algorithm, it satisfies 

p k 

N(k,d,e) = HH2^ 

q=l i=l 

log N(k, d, e) <pk- 0(kde- 2 p 2 \og(kdp/e)) 
= 0{k 2 de- 2 p 3 \og{kdp/e)) 
= d{k 2 de~ h ) 

Each iteration runs in time 0(g(k, d, e) • n) where logg(/c, d, e) = 0(kdlog(kd/ e)) . Hence, the algo- 
rithm's overall running time is 0(f(k, d, e) • n) where log f(k, d, e) = log N(k, d, e) + log g(k, d, e) = 
6{k 2 de- 5 ). 

In deriving this bound on the algorithm's running time, we have assumed that (U, TZ) has a 
subsystem oracle of dimension d. If we instead assume that (U, TZ) has VC dimension d and is 
represented in the input by simply listing all the elements of TZ, the running time increases by a 
factor of n. This is because the trivial implementation of a subsystem oracle — computing TZa 
by enumerating each set of TZ and intersecting it with A — has running time 0(|A| d+1 n), n times 
slower than the bound required by the definition of a subsystem oracle. 
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4 Bounded Covering Multiplicity 



In this section we show that the greedy algorithm gives a 1 — (1 — l/r) r -approximate solution when 
the covering multiplicty of the set system is at most r. 

Definition 4.1. An instance of the maximum coverage problem (U,1Z,k) has covering multiplicity 
r if for every /c-tuple of sets a\, . . . , a/. € 1Z there exists an optimal solution (oi, . . . , o&) of the 
maximum coverage problem, with union denoted by OPT, such that each of the sets a, D OPT for 
1 < i < k is contained in the union of r elements of {oi, . . . , Ofc}. 

One of the interesting special cases which satisfies this property is when the cardinality of every 
set in TZ is bounded by r. In Appendix B we prove that it is also satisfied (with r = 2) when 
U CM 2 and 1Z consists of halfspaces in M 2 . 

Let g\ , #2 , • • • , 9k be the k sets choosen by the greedy algorithm in the order that they are 
choosen. Let w be the coverage function and o±, 02, ■ ■ ■ , be the k sets choosen by OPT. 

Theorem 4.2. Greedy algorithm is a 1 — (1 — l/r) r approximation algorithm for MAXIMUM COV- 
ERAGE with covering multiplicity r. 

Corollary 4.3. Greedy algorithm is a 1 — (1 — l/r) r approximation algorithm for Maximum Cov- 
erage with each set having cardinality at most r. 

4.1 Reduction to a special case 

For simplying the analysis we first argue that we can consider the following special case without loss 
of generality. We take the problem instance on which the greedy algorithm (which we henceforth 
abbreviate as "greedy" ) has a given approximation ratio and convert it into a special instance with 
no better approximation ratio. Then it is enough to analyze the special case. 

• The sets chosen by greedy are different from the optimal sets. This assumption can be made 
as we can just duplicate the sets. Note that this does not change the covering multiplicity. 

• The sets chosen by greedy are disjoint. This is because if one defines a new problem instance 
with gi = gi \ fu*c\<jTjJ then the values of the optimal solution and the greedy solution are 
unchanged. Note that this step uses the fact that the sets chosen by greedy do not belong to 
the optimal solution. Also note that this does not change the covering multiplicity since we 
are not modifying any sets in the optimal solution. 

• Let 01, 02, . . . , Ofc be any optimal solution such that each set giDOPT (1 < i < k) is contained 
in the union of r elements of {01, . . . , o^}. We can assume that these sets oi are pairwise 
disjoint. This is because we can define a new problem instance in which every point belonging 
to two or more of the sets in {01, . . . , Ok} is assigned to one of those sets and deleted from 
the others. The values of the greedy and optimal solutions are unchanged. To preserve the 
property that each set gi n OPT (1 < i < k) is contained in the union of r elements of 
{01, . . . , Ofc}, we simply ensure that every element of gi is assigned to one of those r sets, for 
all i. This is possible due to our previous assumption that the sets g±, ... ,gt are disjoint. 

4.2 Simple case 

Consider the simple case k = t ■ r for some integer t. We will prove the approximation for this 
special case to get some intuition. We will do it in steps. 



S 



• Let Xi = Y,f=ti-i).t+\ w (9i)- 

• Let oi, 02, . . . , Ofc be the optimal sets in decreasing order of w{pi). 

• Note that w(g l ) > w(oi),w(g 2 ) > w(o r+1 ), w(g 3 ) > w(o 2r +i), ■ ■ ■ , w(g t ) > wio^y^). 
These inequalities use the fact that the covering multiplicity is r and the sets o± , . . . , o& are 
disjoint. Now summing the t terms we get Ei=i w (di) — Ej=i w (°(i-i)r+i) > }:w(OPT). 

• Repeating the above step on the residual problem we get Ei=t+i w (9i) — ° PT r ~ Xl ■ Similarly 
we get the following series of equations. 

(m) * OPT-^ 1 r- 

Vl<Z<r-l, *>(*)> r (6) 

i=l-t+l 



• Multiplying (6) by (l-l/r) r ~ l ~ l and summing we get Ei=i ^(fi) ^ i 1 ~ i 1 ~ IT) w(OPT). 
4.3 General case 

Let k = t ■ r + q for some < q < r — 1 . We will use the following lemma in the proof. 
Lemma 4.4. VI < I < r, < z < q < r we have r ■ £^ =0 C" 1 ) { g IZ-i) > 1 ' £™.=o Cm) (7-m) 

Proof. Consider p(z) = /^iw^T+A ■ Consider a random process in which q out of r bins are 

chosen uniformly at random (without replacement) and a ball is added to each one of the q bins. 
Now p{z) represents the conditional probability that a ball is in the I th bin, given that at most z 
bins from the first I — 1 are chosen. One can easily see that this function should be a decreasing 
function of z and hence p(z) > p(q) = f • Q 

Consider r bins and arrange the k greedy sets in the r bins with each bin having either t or 
t + 1 greedy sets. Let bin 1 have the first t or t + 1 sets, bin 2 have the second t or t + 1 sets, and 
so on. Let a be one such arrangement. We will apply inequalities similar to the simpler case. Let 
X? = ^g.^in- w(gj). Let a(l) denote the number of sets in the first I bins. Let £* miri be the residual 

value of the q th minimum set among the optimal sets after the first t greedy sets are choosen. Note 
that £g m j n is a decreasing function of t. Let B(t) denote the set of bins with t sets and B(t + 1) 
denote the set of bins with t + 1 sets. 

npT-V'- 1 r CT_i_f r ,_ i )7 . <T ( i - 1 ) 

• Consider bin I with t + 1 items. Then E^eWn, > l r qmin ■ This 



inequality is proved similar to inequality 6. 



OPT-Eti<-9-< (i_1) 



• Consider bin I with t items. Then E^eMn.; w i.9i) — '^~T qmi " ■ This inequality is 

proved similar to inequality 6. 

Multiplying the equation corresponding to bin / with (1 — l/r) r ~ l and summing we get 
w(greedy) > (1 - (1 - l/r) r w{OPT) 

6injeS(t+l) binieB(t) 



9 



Now taking the average over all arrangements a we get the following equation. 



w(greedy) > (1 - (1 - l/r) r w(OPT) 



(1 - l/r) r 1 ( r -q ^-4 (I - l\ ( r-l \ ( ;_i )t+t0 q (I - \\ ( r -l\ {i-i) t+w 



I \ql \ w=0 v / v y / W= Q \ / \ y 

> (l-(l-l/r) r w(OPT) 

(i-i/r)-' /^,„_ y*-iy r-i y^-iyr-*y (i _W 

qmin 



qmin 



I \q/ \w=0 

> (l-(l-l/r) r w(OPT) 

\r-l / l ~ 1 



(1- /^,, r f -'V r-i N_ /I- A /r-l + IV 



qmin 



I \qJ \w=0 v / \ / \ M 

Now using the fact that x* min is a decreasing function of t and Lemma 4.4 we get w(greedy) > 
(1 - (1 - l/r)>(OPT). 



5 Lower bounds 

This section considers three different low-dimensional restrictions of Maximum Coverage: set 
systems of VC-dimension 2, halfspaces in M 4 , and axis-parallel rectangles in M 2 . In each case, we 
show that the problem is APX-hard and that the greedy algorithm's approximation ratio, restricted 
to that special case, is no better than its worst-case approximation ratio, 1 — -. 

All of these lower bounds are based on the Maximum Vertex Coverage problem, the special 
case of Maximum Coverage in which each element of U belongs to exactly two sets in 1Z. In 
this special case, we can identify 7Z with the vertex set of a graph G, and U with its edge set, 
such that the endpoints of the edge corresponding to x € U are the vertices that correspond to 
the two sets containing x. Thus, Maximum Vertex Coverage can be defined as the problem 
of choosing k vertices of a graph to maximize the number of edges they cover. The problem is 
known to be APX-hard [16] and it is known that the approximation ratio of the greedy algorithm, 
specialized to Maximum Vertex Coverage, is no better than in the general case [5]. In fact, 
the following lemma shows that the performance of the greedy algorithm does not improve when 
we further specialize to bipartite instances of Maximum Vertex Coverage. 

Lemma 5.1. For any e > 0, there exist instances of Maximum Vertex Coverage in which the 
graph is bipartite, the instance has a vertex cover of size k, but the output of the greedy algorithm 
covers only 1 — - + e fraction of the edges. 

The proof consists of taking a well-known hard example for the greedy Maximum Coverage 
algorithm, and encoding it in the form of a bipartite graph; the details are given in Appendix A.l. 

Theorem 5.2. Each of the following special cases of Maximum Coverage is APX-hard: 

(a) Set systems of VC-dimension d>2. 

(b) Halfspaces in M. d , d > 4. 

(c) Rectangular ranges in M. d , d>2. 

Furthermore, the worst-case approximation ratio of the greedy algorithm, when restricted to any of 
these special cases, is 1 — -. 

Proof Sketch. The full details of the proof are given in Appendix A.l. Part (a) is a restatment of the 
known results on Maximum Vertex Coverage. To show Part (b) we embed Maximum Vertex 
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Coverage into halfspaces in R , d > 4 to reconstruct similar results. For Part (c) to show the 
APX-hardness, we use a reduction from Bounded-Degree Vertex Cover, which was shown to 
be APX-hard by Papadimitriou and Yannakakis [15]. For the statement about the approximation 
ratio of the greedy algorithm, we use Lemma 5.1. □ 

An immediate corollary of Theorem 5.2 is the following statement, which justifies that in our 
fixed-parameter algorithm, the super-polynomial dependence of the running time on k and 1/e is 
unavoidable. 

Corollary 5.3. Suppose that Maximum Coverage, specialized to instances with VC-dimension 
d, has a (1 — e)- approximation algorithm with running time 0(f(e,k,d) ■ poly(n)), for every e,k. 
If P ^ NP, then f(e,k,d) must be super-polynomial in k. If P ^ W[l], then f(e,k,d) must be 
super-polynomial in e . In fact, both of these statements hold even if we restrict to d = 2. 

Proof. The statement that /(e, k, 2) must be super-polynomial in k is a restatement of the APX- 
hardness of Maximum Coverage in VC-dimension 2, which is Part (a) of Theorem 5.2. To prove 
that f(e, k, 2) must be super-polynomial in e , we observe that Maximum Coverage, specialized 
to instances with VC-dimension 2, is a generalization of the W[l]-hard partial vertex COVER 
problem, and that approximating the optimum of partial vertex cover within a factor of (1 — e), 
for e < 1/| -27 1, is equivalent to solving it exactly. □ 

6 Open Questions 

We leave several interesting open questions. 

• Improve the running time of our algorithm for sets with bounded VC-dimension. 

• Give an algorithm better than 1 — (1 — l/r) r approximation when the cardinality of each set 
is bounded by r. Such an algorithm could have a running time exponential in r. 

• Resolve the approximability of max-coverage on 3-dimensional halfspaces. We conjecture that 
local search is a PTAS for the problem. Appendix B.2 presents a proof of the two-dimensional 
version of this conjecture. 
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A Appendix 



A.l Missing Proofs from Section 5 

Lemma A.l. For any e > 0, there exist instances of Maximum Vertex Coverage in which the 
graph is bipartite, the instance has a vertex cover of size k, but the output of the greedy algorithm 
covers only 1 — \ + e fraction of the edges. 

Proof. We construct a bipartite graph with edge set E = {1, . . . , kN} (for some sufficiently large 
./V) and vertex set U U W, where k = \U\ < \W\. We refer to U and W as the left and right vertex 
sets, respectively. 



Define a sequence of positive integers no, n\, riz, . . . by the formula rtj 



+1 and let 



Si = Yjj=o n j denote the sequence of partial sums, interpreting so to be 0. If r = min{i | Sj > kN} 
then W = {wi, . . . ,uy}, while U = {u±, . . . , u^}. The right endpoint of edge j is the unique Wi 
such that Si-x < j < Sj, while the left of endpoint of j is the unique itj such that i = j (mod k). 

By construction, \U\ = k and U is a vertex cover. Each element of U has exactly N el- 
ements. However, the greedy algorithm instead chooses vertices wi,...,Wk- To prove this by 
induction, observe that after choosing wi, . . . , Wi, the number of remaining uncovered edges is less 
than kN (l — i) 1 , and these edges are consecutively numbered. Each element of U covers a con- 
gruence class of edges, and therefore it covers fewer than N (l - \) 1 + 1 of the remaining edges, 

whereas itfj+i covers ni edges and rij > N (l — + 1- It follows that the greedy algorithm chooses 
Wi + i and this completes the induction step. 

The number of edges covered by w±, . . . ,Wk is bounded above by 2k + NJ2i=o — iT = 
2k + kN 1 - (1 - \) k . For fe, N sufficiently large, this is less than (l - \ + e) kN. □ 

Theorem A. 2. Each of the following special cases of Maximum Coverage is APX-hard: 

(a) Set systems of VC-dimension d>2. 

(b) Halfspaces in R d , d > 4. 

(c) Rectangular ranges inM. d , d > 2. 

Furthermore, the worst-case approximation ratio of the greedy algorithm, when restricted to any of 
these special cases, is 1 — ^. 

Proof. Recall that Maximum Vertex Coverage can be defined as the instance of Maximum 
Coverage in which every x € U belongs to exactly two sets in 1Z. Any such set system (U,TZ) 
has VC-dimension at most 2: indeed, if 1Z shatters a three-element set {x,y,z} then there exist 
sets i?i,...,i?4 in 1Z whose intersections with {x,y,z} are the sets {x}, {x,y}, {x,z}, {x,y,z}, 
respectively, and consequently x belongs to at least four distinct sets in 1Z. Thus, we see that 
Maximum Coverage restricted to set systems of VC-dimension d includes Maximum Vertex 
Coverage as a special case, as long as d > 2. Part (a) of the theorem now follows from the fact 
that Maximum Vertex Coverage is APX-hard [16] and from Lemma 5.1. 

To prove Part (b) we again show that Maximum Vertex Coverage is a special case. To do 
so, consider any graph with vertex set V = {v±, . . . ,v n } and associate to each vertex vt € V the 
vector bt = (t, t 2 , t 3 , i 4 , 0, . . . , 0) G M. d . Define a halfspace h t C M. d by the inequality VfX> 1. For 
every edge (v r ,v s ) we construct a vector y rs € W 1 that belongs to h r n h s but not to for any 
t 7^ i,j. The construction is as follows. First, write the polynomial (z — r) 2 (z — s) 2 in the form 
SiLo * 2 *' an d then put 

Urs = (01,02,03,04,0, ... ,0) £ R d . 

a 
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The inequality y rs ■ Vt > 1 can be rewritten as — Ylt=i — °o (using the fact that ao = r 2 s 2 > 0) 
and it follows that the inequality is satisfied only when (t — r) 2 (t — s) 2 < 0, i.e. only when t € {r, s}. 
Thus, the set system defined by the vectors {y r s} and the halfspaces {hj} is identical to the 
Maximum Vertex Coverage instance defined by G. 

To prove Part (c), we specialize to rectangular ranges in R 2 . (The case of rectangular ranges in 
R , d > 2 follows a fortiori.) To begin with, we observe that every bipartite instance of Maximum 
Vertex Coverage can be represented using axis-parallel rectangles in R 2 . The construction 
is as follows. If we label the vertices of the bipartite graph as {u±, . . . , u p , w\, . . . , w q } such that 
every edge has one endpoint in {u±, . . . , u p } and the other endpoint in {w±, . . . , w q }, then we can 
represent edge , Wj ) using the point (2i,2j) <E R 2 . Vertex Ui is represented by the rectangle 
[2i - l,2i + 1] x [l,2q + 1] and vertex wj by the rectangle [l,2p + 1] x [2j - l,2j + 1]. This 
construction, combined with Lemma 5.1, suffice to show that the greedy algorithm has worst-case 
approximation ratio 1 — \ when specialized to rectangular ranges in R 2 . To prove APX-hardness, 
we need to use a different reduction that is based on bounded-degree graphs rather than bipartite 
graphs. We use the following theorem from [15]: there exists a constant A such that vertex 
COVER, restricted to graphs of maximum degree A, is APX-hard. 

For any graph G, create an instance of Maximum Coverage as follows. Assuming that G 
has vertex set {v\, . . . , v n } and edge set {e\, . . . , e m }. For each edge e& with endpoints vi, vj, the 
set U C R 2 contains the three points (6k — 2, 2i), (6k, 0), (6k + 2, 2j). These 3m points constitute 
the entire set U. The rectangles in 1Z axe as follows. For each edge e& there are two rectangles 
r i(efc) = [6k — 3,6k + 1] x [—1, n + 1] and ^(e^) = [6k — 1, 6k + 3] x [— 1, n + 1]. For each vertex 
Vi there is one rectangle r(vi) = [0,6m + 3] x [2i — 1, 2i + 1]. If G has a vertex cover C of size 
s, then there is a set of m + s rectangles in 1Z that cover all the points in U : we take rectangle 
r(vi) for each Vi G C, this covers at least one of the points (6k — 2, 2i), (6k + 2, 2j) for each edge 
e/j and the remaining two points corresponding to that edge can be covered using either ri(efe) or 
r 2(efc)- Conversely, if U can be covered by m + s elements of TZ, then the covering must have a 
subcollection of m rectangles that contains one of the rectangles ri(ejt), r2(e&) for each k. Let T 
be the subset of U that is not covered by this subcollection, and let C be the set of all vertices 
such that T contains a point whose y-coordinate is 2i. It is easy to see that C is a vertex cover of 
G, and |C| < s. 

Now let e > 0, A < oo be chosen such that it is NP-hard to distinguish between graphs of 
maximum degree A having a vertex cover of size s (henceforth, yes instances) and those having no 
vertex cover of size less than (1 + e)s (no instances). If G is a yes instance, then the corresponding 
Maximum Coverage instance with parameter k = m + s has optimum value 3m. If G is a 
no instance, then the corresponding MaximumC overage instance with parameter k = m + s has 
optimum value at most 3m — es. Indeed, if there exist m + s rectangles that cover more than 3m — es 
points, then it is trivial to find fewer than m + (1 + e)s rectangles that cover all 3m points, which is 
impossible if G is a no instance. If it possible for a graph with m edges and maximum degree A to 
have a vertex cover of size s then s > m/A. Thus, we have shown that it a yes instance of vertex 
COVER maps to a Maximum Coverage instance whose optimum value is 3m while a no instance 
maps to one whose optimum value is at most (3 — e/A)m, implying the claimed APX-hardness. □ 

B Two-dimensional halfspaces 

Theorem 5.2 rules out the possibility of designing a PTAS for Maximum Coverage specialized 
to halfspaces in R rf for d > 4 (unless P=NP) and it likewise rules out the possibility of proving an 
approximation ratio better than 1 — - for the greedy algorithm. But in very low dimensions, the 
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situation is different. When d = 1, it is easy to see that the greedy algorithm itself always computes 
an optimal solution. When d = 2, a dynamic programming algorithm due to Har-Peled and Lee [8] 
computes an optimal solution in polynomial time. (The algorithm given in that paper is for Set 
Cover rather than Maximum Coverage, but a trivial modification of their algorithm solves 
Maximum Coverage.) Despite the existence of a polynomial-time algorithm for two-dimensional 
Maximum Halfspace Coverage, it is interesting to investigate the approximation ratio of some 
other archetypical algorithms for this problem, especially since this investigation may shed light on 
the approximability of three-dimensional Maximum Halfspace Coverage, which is NP-hard [6] 
and hence the two-dimensional dynamic programming algorithm is unlikely to generalize. In this 
section, we show that when d = 2 the greedy algorithm has approximation ratio 3/4, and there is 
a natural local search algorithm yielding a PTAS. 

B.l Analysis of the greedy algorithm 

To analyze the greedy algorithm for Maximum Halfspace Coverage in two dimensions we prove 
that the covering multiplicity r of the problem instance is 2. Then by Theorem 4.2 we have that 
greedy algorithm is a factor | approximation algorithm for Maximum Halfspace Coverage in 
two dimensions. 

Lemma B.l. The covering multiplicity for Maximum Halfspace Coverage in two dimensions 
is 2. 

Proof. The proof is a series of simple observations. 

(a) Without loss of generality we can assume that no set belongs to both the optimal solution and 
the given solution G. This is because otherwise we can duplicate the set and treat one copy as 
belonging to the optimal solution while the other belongs to the given solution. 

(b) Consider the optimal solution which has the maximum number of sets in common with the 
duplicates created in the previous step. 

(c) In the optimal solution O considered above, for every other set s ^ O, there are two sets 
01,02 € O such that every element of s that is covered by O belongs to o\ U 02. Otherwise, 
using the fact that this is a two-dimensional Maximum Halfspace Coverage instance, we 
can see that one of the previous two conditions is violated. 

□ 

Theorem B.2. The greedy algorithm for two-dimensional Maximum Halfspace Coverage has 
approximation ratio 3/4- 

Proof. The proof follows easily from Lemma B.l and Theorem 4.2. □ 

Example B.3. The following example shows that the analysis of the greedy algorithm is tight. 
Consider the set system s± = {pi,P2},S2 = {P3,Pa} and S3 = {^1,^3} with k = 2. Then it should 
be simple to see that this can be realized an a two-dimensional instance of Maximum Halfspace 
Coverage. One choice for optimal sets is s\, S2 with value of 4. One possible output for the greedy 
algorithm is S3,si with value 3. This gives an approximation of 3/4. 
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B.2 A PTAS via local search 



If (U,TZ,k) is an instance of Maximum Coverage and 5 = {R\, . . . ,Rk} is a solution, define a 
t-swap to be the operation of transforming this solution into another solution S' = {R'i, . . . ,R' k } 
such that there are at most t sets belonging to L but not L' , and vice-versa. If (U,TZ,k) is a two- 
dimensional instance of Maximum Halfspace Coverage and S = {R\,...,Rk} is a solution, 
define area(S) C M 2 to be the set |Ji=i n «> where hj is the halfspace corresponding to Ri. 

In this section we analyze the following local search algorithm. We assume an unweighted 
instance of two-dimensional Maximum Halfspace Coverage, i.e. an instance in which each 
element has weight 1. 

1. Start with a arbitrary fe-tuple of sets S. 

2. While possible do a t-swap to improve the number of elements covered. 

3. If there exists a 1-swap to obtain a solution S' such that area(S) C area(S') and area(S) ^ 
area(S') then perform this 1-swap and go to step 2. Otherwise terminate the algorithm. 

It is simple to see that step 3 does not run for more than n times without the solution improving 
because once a set is deleted from S in step 3, the only event that can re-insert it is a t-swap in 
step 2. 

We will prove that this local search algorithm achieves an approximation ratio of 2t/(2t + 1). 
(This implies that we can obtain a PTAS with running time n°( 1//<E ) by setting t = 1/e.) The 
proof of the approximation ratio is in two steps. We first assume the existence of a certain chain 
decomposition and prove that this implies a 2t/(2t + 1) approximation ratio. Then we construct 
such a chain decomposition. 

B.2.1 Approximation ratio assuming chain decomposition 

For succinctness, we will refer to the sets in the optimal solution and in the output of the local 
search algorithm as opt sets and local sets, respectively. Opt sets will be denoted by Oj and local 
sets by li. If L, t is a subcollection of the local sets, we will frequently use the notation v (Li) to 
denote the set of elements covered by Li but not by any of the other local sets, i.e. 

v(Li)=( u tj) - f U l A- 

Let the opt sets and local sets be grouped into chains Ci, C\ such that the following properties 
are satisfied. 

• the local and opt sets alternate (cyclically) in a chain Ci. 

• Consider a portion of any chain (cyclically) ...o\l\Oi---ltOt+\---- Let L = {£±,£2, ■■■At}-, 0\ = 
{o u o 2) ...,o t } and 2 = {o 2 ,o 3 ,...o m } Then v{L)nOPT C v(L) n (Oi U0 2 ). 

Consider Lj and Oi each having same number of sets and at most t sets. We derive some equations 
based on local optimality. 

w(v(Li) — OPT) + w(v(Li) n OPT) > w(Oi - local) + w(Oi n v(U)) (9) 
=> w(v(Li) — OPT) + w(v(Li) n (OPT — Oi)) > w(Oi - local) (10) 

Now we find sets used in equation (9) and then add these equations to get the desired result. 
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• Consider chain Cj. If number of local sets in d is < 4, then let Lj be the collection of all 
local sets in Cj and let Oj be the collection of all opt sets in Cj. Make 24 such copies, i.e. the 
same equation will be used 24 times in the proof. 

• If chain Cj chain has more than 4 local sets, then let Lj be any 4 consecutive local sets, 
and let Oj be the opt sets in the chain which are shifted from Li by 1 either clockwise or 
counterclockwise. Note that a particular choice for Li appears twice since there are two 
options for Oj. 

Here are some properties of the above decomposition. 

1. Each li belongs to 24 of the sets Lj 

2. Each Oi belongs to 2t of the sets Oj 

3. Let Li be associated with Oj and Ok- Then v(Lj) n OPT C v{L,i) n (Oj U Ok)- This is just a 
restatement of the assumed property of the chain decomposition. 

Based on the above properties we derive the the final inequality. Sum the equation (9) over all 
Lj,Oj. Then we bound each term in the sum. Let OPT denote the set of elements covered by the 
opt sets, and let LOG denote the set of elements covered by the local sets. 



J2w(v(Li)- OPT) < 2t ■ w(LOC - OPT). This is due to property 1. 

• J2 w (Oi ~ LOC) > 2t ■ w(OPT — LOC). This is due to property 2. Note the difference in 
the direction of inequalities. 

• J2 w(v(Li) n (OPT - Oi)) < w(LOC n OPT). This is due to property 3. 
From the above three equations and equation (9) we get the final necessary equation. 

2t ■ w(LOC - OPT) + w(LOC n OPT) > 2t • w(OPT - LOC) 
2t • w(LOC - OPT) + (2t + 1) • w(LOC n OPT) > 2t ■ w(OPT) 
=> (24 + 1) • w(LOC - OPT) + (24 + 1) • w(LOC n OPT) > 24 • w(OPT) 

(24 + l)w(LOC) > 24 • w(OPT) 

=► w(LOC) > 2f$T • w(OPT) (11) 

B.2.2 Obtaining a chain decomposition 

We argue about some properties of two-dimensional Maximum Halfspace Coverage instances, 
based on which we get some associations. Consider the optimal solution which has the maximum 
number of sets in common with the output of the local search algorithm. 

1. For each li we have that 3oj, oi such that li n OPT C Oj U o\. It is simple to see that if this is 
not true then we can change the optimal solution so that the number of sets in common with 
the local optimum increases. Now associate to the corresponding Oj and o/. Let ^4 imt (oj) 
be the local sets associated with o« and A 1 (li) be the opt sets associated with lj. 

2. For each o.; we have that 3lj,l t € A init (oi) such that o, n^ init (oj) C ljl)l t . This is true due to 
the different form of local search used. Because otherwise we can change the local optimum 
to increase the area. Now if A (oi) has more than two sets ij's. Then among them choose 
two lj,lt such that Oj n A mit (oi) C lj U It and keep the association and remove the rest of the 
associations for Oj. Let the new associations be called A new (li) and A new (oj). 
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3. Note that in step 2 we remove some associations. Hence it might no longer be true that 
ii n OPT C A new "(£i). But also note that it is still true that 0j n A init ( 0i ) C A ncw ( 0i ). 

4. Note that due to step 1,2 we have that each local set is associated to at most two opt set and 
each opt set is associated to at most two local sets. 

5. Now form maximal alternating pseudo chains such that a pseudo chain is a list of alternating 
local and optimal sets. Additionally each local set has the association to its adjacent sets and 
each opt set has association to its adjacent sets. 

6. Now there are three kinds of pseudo chain depending on their end points. They are either 
I — I or o — o or o — I (here I — I means a chain starting with a local set and ending with a 
local set). 

7. merge I — I pseudo chain arbitrarily with o — o to get only o — I chain. 

The o — I chain thus got are the chain we desired in the analysis. It is left to be proven that this 
decomposition satisfies the properties needed. 

• By construction the local and opt sets alternate (cyclically) in a chain Cj. 

• Consider a portion of the chain(cyclically) ...o\l\o<i---ttOt+\---- Let L = {£i, £2, ^t}j 0\ = 
{01, 02, Of} and O2 = {02, 03, ...ot+i}. Then we need to prove that v(L) C\OPT C v(L) (~1 
(Oi U 02)- The proof is by contradiction, ie. Let x € v(L) n OPT but x £ 0\ U O2. Then 
x G t c € L. We follow through the associations. 

— Consider the initial association. Then by its property Oj £ A imt (£ c ) such that x € Oj. 

— If oj € A ncw (£ c ) then Oj is adjacent to £ c in the chain and hence Oj G {01,02, ...,ot+i} 
which is a contradiction to the fact that x £ 0\ U 02- 

— If oj i A ncw (£ c ) then in step 2 of associations Oj n A mit ( 0j ) C A ncw ( 0j ). Hence £ d G 
yl ncw (oj) such that x £ £j and Oj and £d are adjacent in some chain. If £& G {£i,£2, ■■■At} 
then G {01,02, 0^+1} and we arrive at a contradiction that x £ 0\ U 02- Otherwise 
x ^ v(L) and we still arrive at a contradiction. 
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